Single-cell Hi-C data enhancement with deep residual and generative adversarial networks

Abstract Motivation The spatial genome organization of a eukaryotic cell is important for its function. The development of single-cell technologies for probing the 3D genome conformation, especially single-cell chromosome conformation capture techniques, has enabled us to understand genome function better than before. However, due to extreme sparsity and high noise associated with single-cell Hi-C data, it is still difficult to study genome structure and function using the HiC-data of one single cell. Results In this work, we developed a deep learning method ScHiCEDRN based on deep residual networks and generative adversarial networks for the imputation and enhancement of Hi-C data of a single cell. In terms of both image evaluation and Hi-C reproducibility metrics, ScHiCEDRN outperforms the four deep learning methods (DeepHiC, HiCPlus, HiCSR, and Loopenhance) on enhancing the raw single-cell Hi-C data of human and Drosophila. The experiments also show that it can generate single-cell Hi-C data more suitable for identifying topologically associating domain boundaries and reconstructing 3D chromosome structures than the existing methods. Moreover, ScHiCEDRN’s performance generalizes well across different single cells and cell types, and it can be applied to improving population Hi-C data. Availability and implementation The source code of ScHiCEDRN is available at the GitHub repository: https://github.com/BioinfoMachineLearning/ScHiCEDRN.


Introduction
Analyzing and modeling 3D conformations (structures) of genomes and chromosomes is important for understanding genome function. Hi-C genome conformation data, capturing the all-versus-all interactions between DNA fragments in chromosomes or genomes, are a set of paired-end sequence reads each indicating which two locations (fragments) on a genome are in contact. It can be represented as a contact matrix/map (M) (Lieberman-Aiden et al. 2009, Trieu andCheng 2016), each entry M [i, j] stores the number of reads, i.e. the interaction frequency (IF) denoting how frequently fragment i and fragment j on the same chromosome for intrachromosomal matrix (or two different chromosomes for inter-chromosomal matrix) are in contact. The IF between two fragments is usually considered inversely related to their spatial distance in the 3D genome structure (Lesne et al. 2014, Varoquaux et al. 2014, Trieu and Cheng 2017. Therefore, Hi-C data can be used to reconstruct 3D chromosome and genome structures by satisfying the distances or contacts between chromosomal fragments derived from the data. However, most existing 3D genome modeling methods focus on building chromosome/genome structures of a population of cells derived from bulk Hi-C data (usually millions of cells). The reconstructed structures are the averaged structures of a population or sub-population of cells, which cannot capture the variability of individual cells well. Although individual cells of the same type usually share some common structural properties, they also possess different interactions between topologically associated domains (TADs), different chromosomal loops within TADs, and different assignments of TADs to chromosomal compartments that play important roles in gene transcription, histone methylation, cell cycle, and cell development (Nagano et al. 2013, Nagano et al. 2017, Tan et al. 2018. Therefore, it is important to analyse and model the 3D genome/chromosome structures for the investigation of cell variability using single-cell Hi-C data. However, the task is very challenging due to the extreme sparsity of single-cell Hi-C data and the high noise in the data (Nagano et al. 2015) in comparison with bulk Hi-C data.
Most single-cell Hi-C data tend to contain <5% of chromosomal contacts captured by bulk Hi-C data produced from millions of cells (Paulsen et al. 2015), which provide much fewer distance restraints to build 3D genome models. Moreover, because single-cell Hi-C data are obtained from a single cell during experiments, it usually contains some noise that affects the 3D genome construction. There is a significant challenge for the 3D genome modeling methods to handle missing and noisy contact information in single-cell Hi-C data. Therefore, it is necessary to improve the quality of single-cell Hi-C data.
To date, the approaches to enhancing and analyzing the single-cell Hi-C data can be classified into two categories (Galitsyna and Gelfand 2021). One is to perform imputation analysis using multiple sets of single-cell Hi-C data. Another is to enhance one set of single-cell Hi-C data without using any extra information. Most existing methods fall into the first category. One such method (Paulsen et al. 2015) imputes missing contacts by using the distance networks of chromosome fragments and linear convolution of chromosomal contacts. Another method (Rosenthal et al. 2019) recovers missing data by combining bulk Hi-C data contact matrices with single-cell Hi-C data. Zhou et al. (2019) populate chromosomal contact matrices (maps) with contacts generated by random walk. scVI-3D (Zheng et al. 2021) denoises singlecell Hi-C data by a variational autoencoder framework.  group single-cell contact maps of different cell types together to produce multiple single-cell Hi-C data maps by a hypergraph neural network. The methods in the first category require multiple single-cell datasets or population Hi-C data with some extra information, such as cell type relationships as input.
In contrast, there are much fewer methods in the second category that directly denoise the Hi-C data of a single cell without using any extra information. One such method DeepLoop ) uses a deep learning model to infer chromatin interactions in single-cell or sparsely alleleresolved Hi-C data at kilobase resolution. The advantage of this kind of methods is that they can be applied to any singlecell Hi-C data including the ones that do not have extra information. It is less burdensome to use these methods because users do not need to obtain extra data.
Some deep learning methods, such as HiCPlus , HiCSR (Dimmick 2020), VEHiCLE (Highsmith and Cheng 2021), DeepHiC (Hong et al. 2020), and HiCARN (Hicks and Oluwadare 2022) were developed for imputing raw bulk Hi-C data but had never been tested on a raw single-cell Hi-C data. To impute the raw Hi-C contact maps of a single cell without using any extra information, we developed ScHiCEDRN based on the deep residual network and generative adversarial network that were used in improving image data (Lim et al. 2017). Different from the existing bulk Hi-C data enhancement methods (DeepHiC and HiCSR) based on the generative adversarial network (GAN) framework, ScHiCEDRN combines customized deep residual networks and convolutional neural networks (CNN) to create a generator to generate the enhanced data from raw lowcoverage single-cell Hi-C data. It consistently performs better than the existing deep learning methods on several single-cell Hi-C test datasets in terms of multiple standard evaluation metrics. Moreover, it outperforms the other deep learning methods in enhancing data for identifying TAD boundaries and reconstructing 3D chromosome structures.

Dataset preparation
Two different single-cell Hi-C datasets of two cell lines and their corresponding population Hi-C datasets were obtained from the Restructured Gene Expression Omnibus (GEO) database (Edgar et al. 2002, Barrett et al. 2013. One singlecell dataset comes from the Drosophila melanogaster organism with seven chromosomes (chr2L, chr2R, chr3L, chr3R, chr4, chrX, and chrM) (Ulianov et al. 2021) (GEO accession number: GSE131811). Another single-cell dataset is from Homo sapiens (source name: Human frontal cortex) with 24 chromosomes (Chromosomes 1-22, X, and Y) (Lee et al. 2019) (GEO accession number: GSE130711). Even though ScHiCEDRN was developed to denoise single-cell Hi-C data, it can be also applied to population (bulk) Hi-C data. So, we also obtained the two population Hi-C datasets in order to compare it with the existing methods initially designed for enhancing population Hi-C data in addition to benchmarking them on the two single-cell Hi-C datasets.
Because the two single-cell Hi-C datasets contain the data of many single cells, we randomly chose the single-cell Hi-C data of 10 single cells of the human cell line and the single-cell Hi-C data of 10 single cells of the Drosophila cell line from the two datasets, respectively, to benchmark the single-cell Hi-C data enhancement methods.
The 40 kb resolution that can be well supported by the sparse data was chosen to generate the chromosomal contact maps. The single-cell Hi-C data of Chromosomes 1, 3,5,7,8,9,11,13,15,16,17,19,21, and 22 of one human cell (called Human cell 1) are used as the training dataset (called human_cell_1_training_data), the single-cell Hi-C data of Chromosomes 4, 14, 18, and 20 of the same one cell (i.e. Human cell 1) are used as the validation dataset (called human_cell_1_validation_data). The training data were used to train our method and all the external methods used in the comparison. The validation dataset was used to monitor the training process, select trained models, and avoid overfitting. The trained methods were then blindly tested on the test data.
The single-cell Hi-C data of Chromosomes 2, 6, 10, and 12 of Human cell 1, the single-cell Hi-C data of Chromosomes 2, 6, 10, and 12 of another two randomly chosen human cells (Human cells 2 and 3), and the single-cell Hi-C data of the two chromosomes chr2L and the chrX of two randomly chosen Drosophila cells were used as the test datasets (called human_cell_1_test_data, human_cells_2_3_test_data, and drosophila_cells_test_data, respectively). These three datasets can test if ScHiCEDRN can generalize well to different chromosomes of the same cell, to different cells of the same species, and to different species. It is worth noting that only the results on the single Hi-C data of these randomly chosen human cells and Drosophila cells are reported in this study because the similar results on the other randomly selected human/Drosophila single cells that we tested were obtained.
In addition to the single-cell test datasets, the population Hi-C data of the human cells (GEO accession number: GSE130711) and the population Hi-C data of all Drosophila cells (GEO accession number: GSE131811) are used as two extra test datasets: human_population_test_data and droso-phia_population_test_data to evaluate if the method can generalize to population (bulk) Hi-C data.
All training, validation, and test datasets were preprocessed by the same method used by DeepHiC (Hong et al. 2020) but with different down-sampled ratios. In this study, the data were randomly down-sampled to 75%, 45%, 10%, and 2% of their original raw reads, respectively, to create the datasets consisting of chromosome contact matrices (maps) at different sparse levels [i.e. low-resolution (LR) contact matrices] as input. The original data without down-sampling [i.e. highresolution (HR) contact matrices] are used as the labels. The labels of the training dataset were used to train the method to predict HR data from LR input data. There are 14 downsampled chromosomal contact matrices in the training dataset, 4 down-sampled chromosomal contact matrices in the validation dataset, 4 down-sampled chromosomal contact matrices in each human single-cell test dataset (i.e. human_ cell_1_test_data and human_cell2_2_3_test_data), 2 downsampled chromosomal contact matrices in each Drosophila single-cell test dataset (i.e. drosophia_cells_test_data), 24 down-sampled chromosomal contact matrices in human population Hi-C test dataset (i.e. human_population_test_data), and 7 down-sampled chromosomal contact matrices in Drosophila population Hi-C test dataset (i.e. drosophila_po-pulation_test_data) for each down-sampling ratio.

ScHiCEDRN architecture
We designed ScHiCEDRN based on the residual blocks in Fig. 1A and the GAN framework, largely inspired by an image super-resolution work (Lim et al. 2017). This residual block, different from traditional ones containing batch normalization layers, is a customized version of the residual block used for improving image resolution (Lim et al. 2017), which was shown to work better than the traditional one on some image datasets (Lim et al. 2017). It consists of a convolutional layer, a ReLU activation layer, a convolutional layer, and a multiplication layer to generate the residual output. The multiplication layer is a scaling factor layer with a scaling factor of 0.1 to scale the output from its preceding convolutional layer, which is important for stabilizing the training for deep architectures with an increasing number of feature maps in neural network layers (Szegedy et al. 2017). The residual output from the multiplication layer is added to the input for the residual block to generate the output of the residual block. The number of feature channels in all convolutional layers is 256 for both the in-and out-channels.
The generator of ScHiCEDRN ( Fig. 1B) consists of 32 residual blocks and two 3 Â 3 convolution layers (one before and one after the residual blocks). It takes a LR single-cell Hi-C contact map as input to generate (predict) a HR single-cell Hi-C contact map as output.
The discriminator of ScHiCEDRN (Fig. 1B) contains a series of seven convolutional blocks preceded by a 3 Â 3 convolutional layer and leaky ReLU layer, followed by a 3 Â 3 convolutional layer and a sigmoid activation layer. Each convolutional block has one convolutional layer, one batch normalization layer, and one leaky ReLU layer. Specifically, the batch normalization layers in the seven blocks are preceded by a 4 Â 4 convolutional layer in the first, third, and fifth blocks, a 3 Â 3 convolutional layer in the second, fourth, sixth, and seventh blocks, respectively, and followed by a leaky ReLU layer. The number of out-channels is equal to that of in-channels in the first, third, fifth, and seventh blocks, while the out-channel numbers are the same as the in-channel numbers in the second, fourth, and sixth blocks. In ScHiCEDRN, the discriminator and generator were trained together to compete with each other to generate better, more realistic Hi-C output maps from the LR input maps.

Architectural difference between ScHiCEDRN and other deep learning methods
Even though ScHiCEDRN uses the same GAN framework as the two previous population Hi-C data enhancement methods: DeepHiC (Hong et al. 2020) and HiCSR (Dimmick 2020), the architecture of the generator and discriminator of ScHiCEDRN is different from that of DeepHiC and HiCSR. The generator of ScHiCEDRN has 32 residual blocks versus 5 residual blocks of DeepHiC and HiCSR. On one hand, each residual block of the generator of ScHiCEDRN does not have the batch normalization layers that DeepHiC and HiCSR use ( Fig. 1B), On another one, each residual block of the generator of ScHiCEDRN includes a multiplication layer that DeepHiC and HiCSR do not have. Furthermore, there is some difference in the other convolutional layers before and after the residual blocks in the generators of three methods. Moreover, the discriminator of ScHiCEDRN has seven convolutional blocks while DeepHiC has five customized convolution blocks and HiCSR uses only three plain convolution layers in their discriminators.
Another two deep learning methods for Hi-C data enhancement (HiCPlus and Loopenhence) do not use the GAN The whole architecture of ScHiCEDRN: the generator consists of a convolutional layer, 32 residual blocks, and a convolutional layer that takes a LR single-cell Hi-C contact map as input to generate a higher (super)-resolution single-cell Hi-C map (HR Sc-HiC map) as output; the discriminator, containing a series of seven convolution blocks, is fed with both the predicted HR Sc-HiC map and the real HR Sc-HiC map to classify them into two categories (predicted/fake or real). framework. HiCPlus (Dimmick 2020) uses only three convolution blocks to denoise the input contact map, while Loopenhance employs a U-Net architecture to enhance a single-cell Hi-C map ).

Loss function of ScHiCEDRN
Equation (1) describes the loss function for the generator of ScHiCEDRN. It contains the mean squared error (MSE) between predicted values and target values [Equation (2)], the perceptual loss used by the VGG19 CNN (VGG) (Ledig et al. 2017) to improve image resolution that can be obtained by Equation (3) to measure the Euclidean distance between the generated matrices and the target matrices, and the total variation (TV) loss [Equation (4)]. Moreover, the loss function of the generator of ScHiCEDRN has a fourth item, adversarial loss (AD) [Equation (5)] to quantify the classification error of the discriminator that distinguishes the generated fake HR matrices from the real HR matrices. The details of the loss function are described as follows.
The loss function for the generator of ScHiCEDRN is defined by Equation (1) as follows, where a, b, and c, are all scalar weights in the range from 0 to 1.
Both l MSE and l VGG are the pixel-wise MSE loss calculated by Equations (2) and (3) respectively, which is the most widely used optimization target for enhancing image resolution. MSE loss function computes the average squared difference [Equation (2)] between two HR matrices, i.e. one (Ŷ i ) generated by the generator, and another one (Y i )-the real target matrix. VGG loss function [Equation (3)] calculates the average squared difference between two reconstructed HR matrices that are built by a pre-trained VGG network fed with both the one (Ŷ i ) generated by ScHiCEDRN generator and the real matrix (Y i ).
The TV loss function usually is used to suppress the noise in the output. The total TV loss for the generator is computed by Equation (4), where the W is a weight scalar, F is the number of filters, C is the number of channels, H and W are the height and width in a tensor of such dimensions [F, C, H, W], respectively. Both h TV and w TV are the TV losses of the H and W dimensions, respectively.
Here, h TV and w TV are calculated in the similar way according to Equations (4.1) and (4.2), respectively. For h TV ,ŷ 2:i ð Þj , andŷ 1:iÀ1 ð Þ j are the same prediction outputs from the generator that remove the first and last rows, respectively. A similar calculation for w TV is repeated by removing the first and last column for the same HR matrix from the generator.
The AD loss function term [Equation (5)] used by the generator of ScHiCEDRN represents the probability of classification error of separating the generated fake HR matrices and real HR matrices. Here, DðŶ i Þ is the probability that a generated HR matrix (Ŷ i ) is classified as a real HR matrix.
The discriminator of ScHiCEDRN uses the binary cross entropy (BCE) loss function [Equation (6)] to penalize misclassifying fake HR matrices (Ŷ i ) as real HR matrices (Y i ). The total loss [Equation (7)] for the discriminator consists of two parts: the BCE losses for the classification of both fake HR matrices and real HR matrices.

Training of ScHiCEDRN
For the generator of ScHiCEDRN, each batch of 40 Â 40 binsize LR single-cell Hi-C contact sub-matrices with 0.75, 0.45, 0.1, and 0.02 down-sampling ratios cropped from full LR single-cell Hi-C contact matrices was used as input to generate the corresponding HR single-cell Hi-C sub-matrices of the same dimension (Fig. 1B). The corresponding HR single-cell Hi-C contact sub-matrices cropped from the original singlecell Hi-C contact matrices without any down-sampling were used as labels. The discriminator of ScHiCEDRN was fed with both the generated HR single-cell Hi-C sub-matrices and the real HR ones to classify them into two categories: generated (fake) and real (Fig. 1B). The loss function of ScHiCEDRN was minimized by the Adam optimizer to adjust the weights. ScHiCEDRN was trained on the training data and validated on the validation data. After training, the full down-sampled single-cell Hi-C chromosomal contact sub-matrices in the different single-cell Hi-C test datasets (human_cell_1_test_data, human_cell s_2_3_test_data and drosophila_cells_test_data) as well as the full down-sampled Hi-C chromosomal contact sub-matrices in population Hi-C test datasets (human_population_test_ data and drosophila_population_test_data) were cropped as 40 Â 40 sub-matrices with zero padding if necessary for the pre-trained model to predict HR sub-matrices. The predicted sub-matrices were then assembled into the full matrices to be compared with the original, real HR matrices to evaluate their quality.
SNR is the ratio of the clean signals (y i;j ) in a target contact matrix to the difference between clean signals and noisy ones (x i;j ) in the corresponding input matrix to evaluate how much the input has been improved. The higher the value, the more noise is removed, and more improvement is obtained.
SSIM calculates the similarities between two images (i.e. a generated matrix and a true matrix) by a moving convolution window. The implementation of SSIM in DeepHiC (Hong et al. 2020) was used in this work [Equation (9)]. Constants C 1 and C 2 were set to the default values ð0:01Þ 2 and ð0:03Þ 2 respectively. The means (l x and l y ), variances (r 2 x and r 2 y ), and covariance (r xy ) were computed via a Gaussian filter.
PSNR measures the ratio of the maximum signal power to the power of corrupting noise in two images (i.e. generated contact matrix and target contact matrix), the higher the value, the more noise is removed, which is defined in Equation (10).

Hi-C reproducibility metrics
Two Hi-C specific reproducibility metrics: GenomeDISCO (Ursu et al. 2018) and HiCRep (Yang et al. 2017) were used to measure the enhancement on single-cell Hi-C contact matrices, which may provide a more biologically relevant measure compared to the standard image-based evaluation metrics. GenomeDISCO was designed to compare contact matrices of 3D genome structures computed from Hi-C data. It utilizes a random walk on a graph generated from contact matrices to obtain a concordance score. Larger concordance values indicate better quality of reproducibility of the Hi-C experiments. HiCRep is implemented in an R package and considers spatial features like distance dependence or domain structures to evaluate the reproducibility of Hi-C data.

ScHiCEDRN outperforms the existing methods in terms of image-based metrics
ScHiCEDRN and the four existing deep learning methods (HiCPlus, DeepHiC, HiCSR, and Loopenhance) are first compared on single-cell Hi-C contact maps of Human cell 1 (human_cell_1_test_data) in terms of the same four imagebased evaluation metrics (PSNR, SSIM, MSE, and SNR) ( Table 1). The single-cell Hi-C data in human_cell_1_test_ data were down-sampled at different ratios of 0.75, 0.45, 0.1, and 0.02 as input for these methods to generate enhanced data. The results show that the method ScHiCEDRN consistently performs better than the other methods at different down-sampling ratios and in terms of all the evaluation metrics. Particularly, at a very low down-sampling ratio of 0.02, it still performs well, indicating it is robust against the high noise in the data. Similar results are observed for two Drosophila cells on drosophia_cells_test_data (see Supplementary Tables S1 and S2). The results show that HiCPlus, DeepHiC, and HiCSR originally designed for improving bulk Hi-C data do not perform as well as ScHiCEDRN on single-cell Hi-C data. While Loopenhance was originally designed to impute a single-cell Hi-C contact map, its performance is substantially worse than ScHiCEDRN's on predicting raw experimental HR single-cell Hi-C data that are not down-sampled in this experiment, even though it may work well for predicting specially processed single-cell Hi-C data .
As an example, Fig. 2 shows a visual comparison of predicted contact matrices (heatmaps) for the region (9.60-11.20 Mb) of Chromosome 2 and the region (76.80-78.40 Mb) of Chromosome 12 of Human cell 1 at a down-sampled ratio of 0.75. The sub-regions (9.92-10.40 Mb) of Chromosome 2 ( Fig. 2A) and (77.12-77.60 Mb) for Chromosome 12 (Fig. 2B) with high contrast between the methods are highlighted by squares in the first and third rows of Fig. 2. When zooming in on the two subregions (the second and fourth rows of Fig. 2), it is clearly observed that ScHiCEDRN generates matrices more like the target matrices than the other methods.
Because three methods (HiCPlus, DeepHiC, and HiCSR) were mainly designed to impute population Hi-C data that contains more chromosomal contacts, we compared the five methods on the human and Drosophila population Hi-C datasets (human_population_test_data and drosophia_popu-lation_test_data). The results are shown in Supplementary  Table S3. ScHiCEDRN designed and trained to impute a single-cell Hi-C data map still outperforms the other methods on the population Hi-C data, indicating that it can generalize well from single-cell Hi-C data to population Hi-C data.

ScHiCEDRN outperforms existing models in terms of Hi-C reproducibility metrics
We also evaluated our method with the existing methods using two Hi-C reproducibility metrics: GenomeDISCO and HiCRep. GenomeDISCO (Ursu et al. 2018), i.e. DIfference between Smoothed COntact maps, measures the concordance of a pair of contact maps generated by chromosome conformation capture experiments, such as Hi-C. HiCRep (Yang et al. 2017) can capture spatial features, such as the distance dependence and domain structures, to assess the Hi-C data reproducibility in genome wide chromatin interactions. As shown in Fig. 3A, ScHiCEDRN has higher GenomeDICSO scores than the other four methods at different down-sampled ratios on human_cell_1_test_data. The similar results are observed in terms of HiCRep metric (Fig. 3B). Overall, in terms of both metrics, ScHiCEDRN is more accurate and robust than the other methods. The similar results on one of the drosophila test datasets (drosophila_cells_test_data) are provided in the Supplementary Material (see Supplementary Fig.  S1). Due to training on randomly down-sampled data, GenomeDISCO scores of different deep learning methods with thousands or millions of free parameters may be sensitive to different down-sampled ratios and randomly sampled datasets in the inference phase. For instance, HiCPlus or HiCSR can have a higher GenomeDISCO score at a lower downsampled ratio than a higher ratio (Fig. 3A). In contrast, hicRep is much less sensitive to the down-sampled ratio than GenomeDISCO. In terms of hicRep, the methods generally have a higher score at a higher down-sampled ratio with only one exception (Fig. 3B), i.e. DeepHiC has a higher hicRep score at lower down-sampled ratio of 0.02 than a higher ratio of 0.1. Overall, ScHiCEDRN is more robust against the change of down-sampling ratio and the variation in downsampled datasets and performs better across all downsampled ratios.

ScHiCEDRN performs well on unseen cells and cell lines
The difference between Drosophila cells and human cells is much larger than the difference between mouse cells and human cells. Existing methods were usually tested on different a Smaller the ratio, the sparser (noisier) the input data. The enhanced matrices are compared with the target matrices (labels) in terms of several imagebased metrics. "*" and "**" denote the best and second-best results in bold, respectively.  Wang et al.
cell lines across human cells and mouse cells, but few previous studies attempted to investigate if the methods trained on the cells of one species (e.g. human) can be used to improve the Hi-C data of another very distant species (e.g. Drosophila).
To assess the robustness of the methods, ScHiCEDRN and the other four methods first were trained on the same dataset (human_cell_1_training_data) of Human cell 1 of the human cell line, and then blindly tested on Chromosomes 2, 6, 10, and 12 of Human cells 2 and 3 from the same cell line (human_cells_2_3_test_data). At last, all these methods were also blindly tested on Chromosomes chr2L and chrX of two different Drosophila cells (drosophila_cells_test_data).  The PSNR (Fig. 4A and C) and SSIM ( Fig. 4B and D) scores of the five methods on the down-sampled test Hi-C data of the human cells and Drosophila cells unseen in the training data are reported in Fig. 4. ScHiCEDRN consistently performs better than the other methods on the human cells and Drosophila cells (Fig. 4 and Supplementary Fig. S2, respectively). The results show that ScHiCEDRN generalizes better from one cell to another cell of the same species (e.g. human) or from one cell of one species (human) to another cell of another species (Drosophila) than the other methods. The PSNR and SSIM scores generally decrease with the decrease of the down-sampled ratios for three methods: ScHiCEDRN, DeepHiC, and HiCSR, among of which ScHiCEDRN consistently gets higher scores.
Similarly, the HiCRep and GenomeDISCO scores of the five methods on two different cell lines of two different test datasets (human_cells_2_3_test_data and drosophila_cell-s_test_data) in Fig. 5 and Supplementary Fig. S3 show that ScHiCEDRN performs better than others, demonstrating that it works well on other cells of the same species and even cells of a different species not used in training. Comparing the GenomeDISCO scores of the five methods on the human cell ( Fig. 5C) with those on the Drosophila cell (Fig. 5D), it is surprising to see some methods, such as ScHiCEDRN and HiCSR, have higher GenomeDISCO scores on the Drosophila cell than on the human cell that comes from the same species (i.e. human) that ScHiCEDRN was trained on. One possible reason is that the initial GenomeDISCO scores of the downsampled data of the Drosophila cell before applying any method to enhance it are higher than the human cell (e.g. the initial GenomeDISCO scores is 0.8636 for Drosophila cell 1, higher than 0.8202 for Human cell 2 at 0.75 down-sampled ratio). The difference in the quality of Hi-C data of human cells and Drosophila cells is partly related to the practical difference in applying single-cell Hi-C techniques to them (Lee et al. 2019, Ulianov et al. 2021. Another possible reason is that GenomeDISCO score is very sensitive to the random variation in the data as seen in Fig. 3A, leading to some unexpected fluctuation.

ScHiCEDRN enhanced contact matrices effectively recover chromosomal features, such as TADs across cells and cell lines (types)
The TADs of chromosomes can be identified from chromosomal contact matrices of population cells by the insulation score method (Crane et al. 2015). However, it is much harder to identify TADs from single-cell Hi-C contact matrices. Here, we follow the similar procedure in  to use the insulation score to identify TADs from chromosomal contact matrices. An insulation score vector is  Wang et al.
calculated by sliding a window across the diagonal of a chromosome contact matrix, each number in which indicates if a chromosomal sub-region (fragment) is in a TAD boundary.
All the methods were blindly tested on two different test datasets (human_cells_2_3_test_data and drosophila_cells_test_data) to enhance their chromosomal contact matrices. The TADs features were extracted from the matrices generated by ScHiCEDRN and the other four methods, then they were compared with the TADs features extracted from the original HR single-cell Hi-C data.
The difference between the insulation score vector for each contact matrix enhancement method and the insulation score vector of the original HR contact matrix is measured by the L2 norm dissimilarity metric (Fig. 6). The smaller the L2 norm value indicates the better TAD result that the method yields. On all the different cells and cell types (lines), ScHiCEDRN performs better than all the other methods. The L2 norm of ScHiCEDRN for the two human cells ranges from 32 to 35 and for the two Drosophila cells ranges from 14 to 20, which appear to be reasonable. Supplementary Fig.  S4 visually compares TADs extracted from the single-cell chromosomal contact matrices enhanced by the five methods and the Target matrix (the real HR contact matrix). The TADs were identified and visualized by GenomeFlow (Trieu et al. 2019). It is shown that the TADs identified from the matrix enhanced by ScHiCEDRN are more similar to the ones from the Target than the other four methods. Supplementary Fig. S5 shows that the aggregated insulation score vector computed from the matrices enhanced by ScHiCEDRN is similar to the one from the real HR matrix.
Moreover, we compared the L2 norm of the insulation score vectors for the five methods on the population Hi-C data ( Supplementary Fig. S6). The results show that ScHiCEDRN still works better than the other methods. Its L2 norm for the Human population Hi-C data and Drosophila population Hi-C data is $28 and $12, respectively, each of which is lower than that for the single-cell data counterpart. The better results obtained on the population Hi-C data than on the single-cell Hi-C data may be because the former provides more reliable information for the TAD recovery than the latter.

Comparison of 3D chromosomal structures reconstructed from contact matrices generated by different methods
We evaluate how well the chromosomal contact matrices enhanced by the different methods can be used to reconstruct 3D chromosome structures for Chromosomes 2, 6, 10, and 12 of Human cell 1 in human_cell_1_test_data. The 3D chromosome structures were reconstructed by 3DMax  (Oluwadare et al. 2018). The similarity between the 3D structures reconstructed from the enhanced matrices and the original HR matrices was calculated by a python package of TMscore (Zhang and Skolnick 2004) at https://github.com/ Dapid/tmscoring. Distance constraints from each region of 1.6 Mb of each test chromosome were extracted from the enhanced matrices or the original HR matrix as input for 3DMax to build three 3D chromosomal structures. The box plots of TM-scores of the 3D chromosome structures in comparison with the structures reconstructed from the original matrices for the five methods (the higher values the better results) are shown in Fig. 7. ScHiCEDRN performs better than the other methods for three (Chromosomes 2, 6, and 12) out of four chromosomes (Fig. 7A, B and D). As an example, the reconstructed 3D conformations with different methods aligned against the conformation reconstructed from the real single-cell Hi-C data for the same region 73.6-78.4 Mb of Chromosome 12 are provided in Supplementary Fig. S7. It shows that the 3D chromosome conformations reconstructed from the Hi-C data enhanced by ScHiCEDRN and DeepHiC are more similar to the one reconstructed from the real singlecell Hi-C data than the other methods. This visualized example is consistent with the overall results in Fig. 7D for Chromosome 12. It is worth noting that reconstructing 3D chromosome conformations from very sparse single-cell Hi-C data is still a very challenging task and the metric of comparing 3D chromosome conformations is not mature, the analysis in Fig. 7 is not a thorough assessment of the capability of the methods for improving single-cell Hi-C data to enhance 3D conformation reconstruction.

Conclusion and future work
Improving the quality of sparse and noisy single-cell Hi-C data is an important and challenging task. We developed a deep learning method ScHiCEDRN based on both deep residual networks and generative adversarial networks to enhance single-cell Hi-C chromosomal contact matrices. We also provided a rigorous benchmark of the new method and four existing methods on enhancing single-cell Hi-C data. We demonstrate that ScHiCEDRN can generalize well across individual cells of the same cell line (type) or even between different cell types of two very different species (e.g. human versus Drosophila), indicating the general patterns in the single-cell Hi-C data have been learned by the method to enhance the data.
Tested on the sparse single-cell Hi-C datasets of human and Drosophila at different noisy and sparse levels controlled by four down-sampled ratios (0.75, 0.45, 0.1, and 0.02), ScHiCEDRN outperforms the existing methods consistently in terms of the standard image-based evaluation and Hi-C data reproducibility metrics. Due to the extreme sparsity of single-cell Hi-C data, the identification of TAD boundaries and chromatin loops and the reconstruction of 3D chromosomal conformation from single-cell Hi-C data are much more difficult and challenging than on population (bulk) Hi-C data. Even though it may take several years to fully address the issues related to single-cell Hi-C data, they still could be roughly analysed by the methods initially developed for population Hi-C data, such as the application of the insulation scores to TAD-like boundaries identification from single-cell Hi-C data in . According to the TAD insulation scores, ScHiCEDRN generates better chromosomal contact matrices for either recovering TAD boundaries or reconstructing 3D chromosomal structures, indicating its architecture of combining the deep residual networks and GAN may be better at producing more biologically relevant contact matrices than the existing GAN-based methods. Although ScHiCEDRN can get better results than others on the TADs calling and 3D chromosomal structure reconstruction, the evaluation of these methods in terms of these two metrics is still rough and premature because the metrics are still not defined and quantified well. The more detailed and robust analysis of single-cell Hi-C data enhanced by these methods including 3D chromosomal conformation reconstruction will be explored in the future.

Supplementary data
Supplementary data are available at Bioinformatics online.

Conflict of interest
None declared.

Funding
This work was partly supported by the William and Nancy Thompson Professorship; and the NextGen Precision Health seed fund.